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In the last two decades, many random graph models have been 
proposed to extract knowledge from networks. Most of them look for 
communities or more generally clusters of vertices with homogeneous 
connection profiles. While the first models focused on networks with 
binary edges only, extensions now allow to deal with valued networks. 
Recently, new models were also introduced in order to characterize 
connection patterns in networks through mixed memberships. This 
work was motivated by the need of analyzing a historical network 
where a partition of the vertices is given and where edges are typed. 
A known partition is seen as a decomposition of a network into sub- 
graphs that we propose to model using a stochastic model with un- 
known latent clusters. Each subgraph has its own mixing vector and 
sees its vertices associated to the clusters. The vertices then connect 
with a probability depending on the subgraphs only, while the types 
of the edges are assumed to be sampled from the latent clusters. 
A variational Bayes expectation-maximization algorithm is proposed 
for inference as well as a model selection criterion for the estimation 
of the cluster number. Experiments are carried out on simulated data 
to assess the approach. The proposed methodology is then applied 
to an ecclesiastical network in merovingian Gaul. An R code imple- 
menting the inference algorithm is available from the authors upon 
request. 

1. Introduction. Since the original work of Moreno (1934) on sociograms, 
network data has become ubiquitous in Biology (Albert and Barabasi, 2002; 
Milo et al., 2002; Palla et al., 2005) and computational social sciences (Snijders and Nowicki, 
1997). Applications range from the study of gene regulation processes to 
that of social interactions. Network analysis was also applied recently to a 
medieval social network in Villa, Rossi and Truong (2008), where the au- 
thors find a clustering of vertices through kernel methods. Both determin- 
istic and probabilistic methods have been used to seek structure in these 
networks, depending on prior knowledge and assumptions on the form of 
the data. For example, Hofman and Wiggins (2008) looked for a partition 
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of the vertices where the clusters exhibit a transitivity property. The model 
of Handcock, Raftery and Tantrum (2007) on the other hand assumes the 
relations to be conditioned on the projection of the vertices in a latent social 
space. Notable among the community discovery methods, though asymptot- 
ically biased (Bickel and Chen, 2009), are those based on the modularity 
score given by Girvan and Newman (2002). 

Many of the other currently used methods derive from the stochastic 
block model (SBM) (Nowicki and Snijders, 2001), which is a probabilistic 
generalization (Fienberg and Wasserman, 1981) of the method applied by 
White, Boorman and Breiger (1976) to Sampson's famous monastery data. 
SBM assumes that each vertex belongs to a hidden cluster and that con- 
nection probabilities between a pair of verices depend exclusively on their 
clusters, as in Frank and Harary (1982). The parameters and clusters are 
then inferred to optimize a criterion, usually a lower bound of an inte- 
grated log-likelihood. Thus, Latouche, Birmele and Ambroise (2011) used an 
approximation of the marginal likelihood, while Daudin, Picard and Robin 
(2008) considered a Laplace approximation of the integrated classification 
likelihood. A non parametric Bayesian approach was also introduced by 
Kemp et al. (2006) to estimate the number of clusters while clustering the 
vertices. SBM was extended by the mixed membership stochastic block 
model (MMSBM) (Airoldi et al., 2008), which allows a vertex to belong to 
different clusters in its relations towards different vertices, and by the over- 
lapping stochastic block model (Latouche, Birmele and Ambroise, 2011), 
which allows a vertex to belong to no cluster or to several at the same time. 
More recent works focused on extending MMSBM to dynamic networks 
(Xing, Fu and Song, 2010), or dealing with non-binary networks, such as 
networks with weighted edges (Soufiani and Airoldi, 2012). Goldenberg, Zheng and Fienberg 
(2010) provide an extensive review of statistical network models. 

However, one kind of network which, to the best of our knowledge, has 
not been explored yet is when a partition of the nodes is observed and bears 
some importance in their behaviour. For example, when analyzing a world- 
wide social network describing social interactions between individuals, it is 
reasonable to assume that different countries, or at a different scale, different 
regions of the world, will have different connectivity patterns. We might also 
observe the same kind of phenomenon between different scientific fields in a 
citation network. 

Our aim in this paper is to provide insight into the relationships between 
ecclesiastics and notable people in the kingdoms that made up Merovingian 
Gaul, by analyzing a network characterizing their different kinds of relations. 
Specifically, the data set focuses on the relationships between individuals 
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built during the ecclesiastical councils which took place in Gaul during the 
6th century. These councils were convened under the authority of a bishop to 
discuss specific questions relating to the Church. Though consisting mainly 
of clergymen, laics would also occasionally take part, as representatives of 
the secular power or experts in the questions discussed. These assemblies 
shaped a significant part of that period, and we are interested in discovering 
how they reflected the relationships between various groups of individuals. 
For this network, extra information on the vertices, namely a geographical 
partition, is available, associating each individual to a specific kingdom. 
This partition induces a decomposition of the network into subgraphs and 
wc aim at modelling the connection pattern of each subgraph through latent 
clusters. 

Thus, in this paper, we propose a new model, that we call the random sub- 
graph model (RSM), for the analysis of directed networks with typed edges 
for which a partition of the vertices is available. On the one hand, wc con- 
sider that the probability of observing an edge between two vertices depends 
solely on the subgraphs to which the vertices belong. On the other hand, 
we assume that each vertex belongs to a hidden cluster, with a probability 
depending on its subgraph. Then, if a relation is present, its type is drawn 
from a multinomial distribution whose parameters depend on the clusters to 
which the vertices belong. Let us emphasize that the latter property allows, 
once the inference is done, to compare the different subgraphs. 

The article is organized as follows. The random subgraph model is pre- 
sented along with its inference algorithm in Section 2, then tested on sim- 
ulated data and compared to other models in Section 3. Our model is then 
applied to the ecclesiastical network and the results are analyzed from the 
historical point of view in Section 4. Concluding remarks and possible ex- 
tentions are finally discussed in Section 5. 

2. The random subgraph model. We consider a directed graph Q 

with vertices represented by its N x N adjacency matrix X along with a 
known partition V of the vertices into S classes. Our goal is to cluster the 
network into K groups with homogeneous connection profiles, i.e. estimating 
a binary matrix Z such that = 1 if vertex i belongs to cluster k, 
otherwise. 

Let us now detail the notations. Each edge Xij, describing the relation 
between the vertices i and j, is typed, i.e. takes its values in a finite set 
{0, . . . , C}. Note that X^j = corresponds to the absence of an edge. We 

assume that Q does not have any self loop and therefore the entries Xa will 
not be taken into account. In order to simplify the notations when describing 
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Notations 


Description 


X 


Adjacency matrix. Xij € {0, . . . , C} indicates the edge type 


A 


Binary matrix. Aij = 1 indicates the presence of an edge 


Z 


Binary matrix. Z^k = 1 indicates that i belongs to cluster k 


N 


Number of vertices in the network 


K 


Number of latent clusters 


S 


Number of subgraphs 


C 


Number of edge types 


a 


Qsfe is the proportion of cluster k in subgraph s 


n 


Ilfcfc is the probability of having an edge of type c 




between vertices of clusters k and I 


7 


7t-s probability of having an edge between vertices of subgraphs r and s 


Table 1 

Summary of the notations used in the paper. 



the model, we also consider the binary matrix A with entries Aij such that 
= 1 ^ ^ 0. 
We also emphasize that the observed partition V induces a decomposition 
of the graph into subgraphs where each class of vertices corresponds to a 
specific subgraph. We introduce the variable Sj which takes its values in 
{1, . . . , S*} and is used to indicate in which of the subgraphs vertex i belongs, 
for i G {!,..., A^}. 

2.1. The probabilistic model. The data is assumed to be generated in 
three steps. First, the presence of an edge from vertex i to vertex j is sup- 
posed to follow a Bernouilli distribution whose parameter depends on the 
subgraphs Sj and sj only: 

Ai,j ~ >3{-fsi,sj)- 

Each vertex i is then associated to a latent cluster with a probability de- 
pending on Si. In practice, if we assume for now that the number K of latent 
clusters is known, the variable Zj is drawn from a multinomial distribution: 

Z,~M(l;a,J, 

where 

K 

Vs G l,...,S',^a5fe = 1. 

fe=i 

A notable point of the model is that we allow each subgraph to have different 
mixing proportions cxg for the latent clusters. We denote hereafter a = 
(ai, . . . ,0:5'). Finally, if an edge between i and j is present, i.e. Aij = 1, 
its type Xij is sampled from a multinomial distribution with parameters 




Fig 1 . Graphical representation of the RSM model. 

depending on the latent clusters. Thus, if i belongs to cluster k and j to 
cluster /: 

Xi,j\ZikZji = IjAij = 1 ~ M{l,Ilki), 
where the sum over the C types of each vector 11^; = (Jlkiij ■ ■ ■ ■, ^kic) is: 

c 

v(fc,0G{i,...,if}',5^nHc = i. 

c=l 

If there is no edge between the two vertices, the entry Xij is simply set to 
Xij = Aij = 0. Figure 1 presents the graphical model associated with the 
RSM model and all notations are summarized in Table 1. 

The model is therefore defined through the joint distribution: 

p(X, A, Z I a, 7, n) = p(X, A I Z, 7, n)p(Z I cx) 

= p(X|A,Z,n)p(A|7)p(Z|a), 

where 

K C 

p(X I A, Z, n) = JJ n^^^')^'"^' S(Xii=c)A,iZikZj,^ 
k,l c=l 

and 

N 

p(A|7) = n>1:^.(l-7n,r,)'-^-'- 

Finally, 

N K 

p{z I «) = n n 

i=l k=l 
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We refer to the appendix for the detailed calculation of the complete data 
log likelihood associated to the RSM model. 

2.2. Bayesian framework. In such a context, a popular way to perform 
the inference of the model is to consider a Bayesian framework and to use a 
variational Bayes expectation-maximization (VBEM) algorithm. In order to 
obtain analytical optimization equations, we consider conjugate prior distri- 
butions. Thus, since Zj is sampled from a multinomial distribution, we rely 
on a Dirichlet prior to model the parameters cxg'- 

p{as) = Dir • • • , Xsi^) ,Vs E {!,..., S"}. 

A similar distribution is used as a prior distribution for the parameters 11^/: 

piUki) = Dir (Hh; Ell,, • • • , ^lic) , V(A^, I) G {1, . . . , K}^ 

If no prior information is available, a common choice in the literature consists 
in fixing the hyperparameters of the Dirichlet to 1/2, i.e. x^k ~ ^/^^^(s, /c) 
and Hfc/c = 1/2, V(A;, /, c). Such a distribution corresponds to a non informa- 
tive Jeffreys prior distribution which is known to be proper (Jeffreys, 1946). 
A uniform distribution can also be obtained by setting the hyperparameters 
to 1. 

Finally, since the presence or absence of an edge between a pair of ver- 
tices is drawn from a Bernoulli distribution, we rely on a beta prior for the 
parameters jrs- 

pi'jrs) = Beta(7^s; aj!^, 6°J, V(r, s) G {1,... ,3}"^. 

Again, if no prior information is available, both hyperparameters a^g and 
b^g can be set to 1/2 or 1 to obtain non informative prior distributions, 
respectively a Jeffreys or a uniform distribution. 

2.3. Inference with the variational Bayes EM algorithm. Given the ob- 
served matrices X and A, we aim at estimating the posterior distribution 
p(Z, Q, 7, n |X, A), which in turn will allow us to compute a maximum a 
posteriori estimate of the clustering structure Z as well as the parameters 
{a, 7, n). Because this distribution is not tractable, we rely on a variational 
approximation. Thus, given a distribution (/(Z, q, 7, 11), the marginal like- 
lihood can be computed in two terms: 

logp(X, A) = £{q) + KL{q{.)\\p{.\ X)), 
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Algorithm 1 VBEM algorithm for the RSM model 
Initialize matrix r = Z with kmeans 
Initialize hyperparameters 6" = (x'', 7*^, H") 
Compute £.{q) 
while |6I"™ -6I°"| > e do 

E step: update r 

M step: update 0""" = (x, 7, H) 

Compute £ 
end while 



where C is defined as follows: 

m = Y.I g(z, 7, n) iog( ^^^;^' ^' "^'^^^ )dcxd-fdu, 

and the KL divergence is given by: 

KL(g(.)|b(.|X)) = -^ / g(Z, Q, 7, n) log( ^^^'J'' ^' " ' )d a d7 rfn. 

2 7a,7,n g(Z,a,7,n) 

Finding the best approximation of the posterior distribution p(Z, ct, 7, 11 1 X, A) 
in the sense of the KL divergence becomes equivalent to finding g(-) that 
maximizes the lower bound C{q) of the integrated log-likelihood. To obtain 
a tractable algorithm, we assume that q{.) can be fully factorized, that is: 

g(Z,a,7,n)= n^^Z^) [\\(l{o^s)J[q{ls,t)]j{q{Ilk,l). 

\i=l J \s=l t=l J k,l 

The functional optimization of the lower bound with respect to q{-) is per- 
formed using a VBEM algorithm (see Algorithm 1). All the optimization 
equations are given in the appendix. 

2.4. Initialization. The VBEM algorithm, though useful in approximat- 
ing posterior distributions of graphical models, is only guaranteed to con- 
verge to a local optimum (Bilmes, 1998). Strategies to tackle this issue in- 
clude simulated annealing and the use of multiple initializations (Biernacki, Celeux and Govaert, 
2003). In this work, we choose the latter option. In order to have a better 
chance of reaching a global optimum, VBEM is run for several initializations 
of a kmeans like algorithm with the following distance d{i,j) between the 
vertices i and j: 

N N 

(1) dii,j) = ^{Xih + Xjh)AihAjh + ^(^hi + Xhj)AhiAhj. 

h=l h=l 
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The first term looks at all possible edges from i and j towards a third vertex 
h. If both i and j are connected to h, i.e. Ai^Ajh = 1, the edge types 
and Xjh are compared. By symmetry, the second term looks at all possible 
edges from a vertex h to both i as well as j, and compare their types. Thus, 
the distance computes the number of discordances in the way both i and j 
connect to other vertices or vertices connect to them. The algorithm starts 
by sampling the cluster centers among all the vertices of the network. It then 
iterates a two-step procedure until convergence of the cluster centers. In the 
first step, the vertices are classified into the cluster with the closest center. 
Each cluster center is then associated to a vertex minimizing its distance 
with all the vertices of the corresponding cluster. In Section 3, we illustrate 
that the distance defined in (1) and the associated kmeans like algorithm 
leads to a a good starting point for the VBEM algorithm. 

2.5. Choice of K. So far, the number K of latent clusters has been as- 
sumed to be known. Given K, we showed in Section 2.3 how an approx- 
imation of the posterior distribution over the latent structure and model 
parameters could be obtained. We now address the problem of estimating 
the number of clusters directly from the data. Given a set of values of K, 
we aim at selecting K* for which the marginal log likelihood logp(X \K) is 
maximized. However, because this integrated likelihood involves a marginal- 
ization over all the model parameters and latent variables, it is not tractable. 
Therefore, we propose to replace the marginal likelihood with its variational 
approximation, as in Bishop (2006); Latouche, Birmele and Ambroise (2009, 
2012). Thus, for each value of K considered, the VBEM algorithm is applied. 
We recall that the maximization of the lower bound induces a minimization 
of the KL divergence. After convergence of the algorithm, the lower bound is 
used as an approximation of logp(X \K) and K is chosen such that the lower 
bound is maximized. We prove in the appendix that, if computed right after 
the M step of the variational Bayes EM, the lower bound has the following 
expression: 

where C{x) = ]^jhl^''''\ if x G and B{a, h) = rS? .V(a, h) G R^. 

3. Numerical experiments and comparisons. In this section, we 
first run experiments aiming at proving the validity of our model, focusing 
on the ability of its inference procedure to find the right clustering. We 
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Fig 2. Example of a RSM network for S = 2 subgraphs (indicated by the node forms), 
C — 3 types of edges (indicated by the edge colors) and K = 3 clusters to identify (indicated 
by the node colors). 



then compare its performance to that of other stochastic models for graph 
clustering. 



3.1. Experimental setup. In order to evaluate the performance of our 
approach, we applied it on data generated according to the RSM model. 
To simplify the parameterization and facilitate the reproducibility of the 
experiments, we constrained the parameters 11 and 7 to have the following 
forms: 



n 



u 



u 



,7 



A 



A 



where A, e E [0,1] and u, v € [0,1]^. With such a parameterization, the 
probability A of an edge within a subgraph is assumed to be common be- 
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Parameters 


Scenario 1 


Scenario 2 


Scenario 3 


N 


100 


100 


100 


S 


1 


1 


3 


c 


3 


3 


3 


K 


3 


3 


3 


a. 


(0.3,0.3,0.4) 


(0.3,0.3,0.4) 




0.5 0.5 
0.5 0.5 
0.5 0.5 




u 


(0.8,0.1,0.1) 


(0.5,0.45,0.05) 


(0.5,0.45,0.05) 


V 


(0.1,0.1,0.8) 


(0.1,0.45,0.45) 


(0.1,0.45,0.45) 


A 


0.2 


0.2 


0.2 


e 


0.06 


0.06 


0.1 



Table 2 

Parameter values for the three types of graphs used in the experiments. 



tween subgraphs and the probabihty e of a connection between different sub- 
graphs is also assumed to be the same for all couples of subgraphs. Similarly, 
the vector u controls the probability of each edge type between nodes of a 
same cluster whereas v defines the edge type probabilities between nodes of 
different clusters. We recall that the prior probabilities of each group within 
each subgraph are given by the parameter cx = (cti, . . . , 0:5). 

Figure 2 presents an example of a network generated this way with pa- 
rameters S = 2, C = 3, K = 3, a = ^'t. , A = 0.6, e = 0.06, 

[O.b U.3 U.IJ 

u = (0.8,0.1,0.1) and v = (0.1,0.3,0.6). This RSM network is made of 30 
nodes with 5 = 2 subgraphs (indicated by the node forms), (7 = 3 types 
of edges (indicated by the edge colors) and K = clusters that have to be 
identified in practice (indicated by the node colors) . 

In order to illustrate, on various situations, that RSM is a relevant model 
and that its corresponding inference procedure provides an accurate estima- 
tion of the true clustering structure, we rely in the following paragraphs on 
three types of graphs, described in Table 3.1. The three scenarii considered 
correspond to different situations ranging from a almost classical setup to a 
more specific one. The first scenario considers networks with no subgraphs 
(5 = 1) and with a preponderant proportion of edges of type 1 {Ui = 0.8) 
and 3 (C/3 = 0.8). The second scenario still considers networks with no sub- 
graphs (5 = 1) but with balanced proportions of edge types. Finally, the 
third scenario considers networks with several subgraphs (5 = 3) and bal- 
anced proportions for edge types. Therefore, the latter case should be the 
more complex situation to fit. 

The VBEM algorithm with multiple initializations, presented in Section 
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Fig 3. Criterion values C{q) vs. number K of groups for a graph simulated according to 
scenario 1. 



2, is used in the following experiments. For a given value of K, the result 
with the best value for C{q) is chosen among the multiple initializations. 
Then, a clustering partition is deduced from the posterior probabilites Tj^ 
using the maximum a posteriori (MAP) rule, i.e. a node is assigned to the 
group with the higest posterior probability. The resulting partition is finally 
compared with the actual partition using the Adjusted Rand Index (ARI) 
(Rand, 1971), which serves as a widely accepted criterion for the difficult 
task of clustering evaluation. 

3.2. Choice of K and inference results. In this first simulation study, we 
aim at evaluating the ability of the lower bound C{q) to serve as a criterion 
for selecting the appropriate number K of clusters. To this end, the VBEM 
algorithm for the RSM model was first run on a graph simulated according 
to scenario 1 for several values of K. The highest criterion value among 
the different initializations obtained for each value of K are presented in 
Figure 3. The figure indicates that K = 2> seems to be the appropriate 
number of groups for the studied network, which is the actual number of 
group. 

We then replicated this experiment over 50 networks, still simulated ac- 
cording to scenario 1, for both verifying the consistency of C{q) and studying 
the clustering ability of our approach. Figure 4 shows the repartition of the 
criterion values (left panel) as well as the associated ARI values (righ panel). 
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Fig 4. Repartition of the criterion (left panel) and ARI (right panel) over 50 networks 
generated with the parameters of the first scenario. 



These results confirm that the lower bound C{q) is a valid criterion for select- 
ing the number of groups. One can also observe that the partition resulting 
from our VBEM algorithm has, for the selected number of groups, a good 
adequation with the actual partition of the data. 

3.3. Comparison with the stochastic block model. Our second set of ex- 
periments compares the performance of RSM to that of other models on 
data drawn according to its generative process. We were interested in the 
comparison with the following models: 

• binary SBM (presence): We fit a binary SBM using the R package 
mixer (Ambroise et al., 2010) on the data by considering only the pres- 
ence of the edges and not the type of the edges. 

• binary SBM (type 1, 2 or 3): We fit a binary SBM, still using the 
Mixer package, on the graphs defined by taking only the edges of one 
type. 

• typed SBM: We consider here a SBM with discrete edges. Although 
SBM was originally proposed in Nowicki and Snijders (2001) with dis- 
crete edges, existing softwares only propose to fit a SBM on binary 
networks. We therefore had to implement a version of the SBM which 
supports typed edges. Note that, in this case, the types of edges are 
in {0, . . . , C}, where corresponds to the absence of a relation. 

• RSM: We run the VBEM algorithm, that we proposed in Section 2 for 
the inference of the RSM model, with the available subgraph partition 
and with 5 random initializations for each run. 
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Method 


Scenario 1 


Scenario 2 


Scenario 3 


binary SBM (presence) 
binary SBM (type 1) 
binary SBM (type 2) 
binary SBM (type 3) 


0.001 ± 0.012 
0.976 ± 0.071 
0.001 ± 0.006 
0.959 ± 0.121 


0.001 ± 0.013 
0.494 ± 0.233 
-0.003 ± 0.006 
0.519 ± 0.219 


0.239 ± 0.061 
-0.372 ± 0.262 
0.179 ± 0.097 
0.367 ± 0.244 


Typed SBM 


0.694 ± 0.232 


0.472 ± 0.339 


0.360 ± 0.162 


RSM 


1.000 ± 0.000 


0.918 ± 0.154 


0.820 ± 0.155 



Table 3 

Average ARI values and standard deviations for binary SBM, typed SBM and RSM 
according to the three simulation scenarn. The results are averaged on 25 simulated 

graphs for each scenario. 



Table 3 presents the average ARI values and standard deviations on 50 
simulated graphs for each scenario and with binary SBM, typed SBM and 
RSM. We point out that the inference is done with the actual number of 
clusters and this for each method. One can observe that, for the first scenario, 
the binary SBM based on the link presences and the type 2 SBM always fail 
whereas type 1, type 3 and typed SBM work pretty well. Those behaviors 
can be explained by the nature of scenario 1 which is a rather easy situation 
with no subgraphs and a predominant presence of type 1 and type 2 links. 
However, we can remark that it seems easier in this case to fit a binary SBM 
on type 1 or type 2 edges than to fit a typed SBM. This is due to the high 
discriminative power of type 1 and type 2 edges in this specific scenario. 
Let us also remark that RSM works perfectly here even though the network 
does not contain any subgraphs. 

Regarding scenario 2, which considers a situation where there is still no 
subgraphs but with more balanced proportions of the different edge types, 
one can first notice that binary SBM and type 2 SBM fail once again. The 
type 1 and type 3 SBM have now a behavior closer to the one of typed 
SBM whereas RSM gives very accurate results once again. Finally, scenario 
3 considers a RSM-type network, i.e. with several subgraphs, and all SBM- 
based algorithms are significantly outperformed by RSM which succeeds 
in exploiting both the information carried by different edge types and by 
the different subgraphs. To summarize, the RSM model and its associated 
VBEM algorithm turn out to be effective on situations ranging from classical 
setups without subgraphs to complex scenarii with subgraphs and typed 
edges. 

4. Ecclesiastical network. This section now focuses on applying the 
RSM model to the ecclesiastical network, that we briefly described in the 
introduction and that initially motivated this work, and on analyzing its 
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results from the historical point of view. 

4.1. Description of the data. Having shown that our variational Bayes 
EM algorithm can be used to infer the hidden clusters of data generated un- 
der the assumptions of the RSM model, we now apply it to analyze relational 
data built around the ecclesiastical councils that took place in Merovingian 
Gaul during the 6th century. Since 511, Kings as well as bishops could call 
for a council, which means that some political, judiciary or legal issues were 
addressed at these occasions, and that laics (kings, dukes or counts for in- 
stance) would attend. During the 6th century, 46 councils took place in Gaul, 
which was divided into the kingdoms of Neustria, Austrasia and Burgondy, 
and the provinces of Aquitaine and Provence (the latter changed hands sev- 
eral times during this period). Although there were mostly local or regional 
councils, attended by individuals from a specific ecclesiastical province, there 
were some national councils convened under the authority of a king. 

We know the composition of some of these councils thanks to the acts 
written at the end of the meeting, and which were signed by all attending 
members. For others, our sources only mention the presence of "bishops, 
abbots and clergymen". For this reason, we had to resort to other types 
of sources to build our database. In addition to the council acts, we used 
narrative texts (among which the famous Ten History Books by Gregory 
of Tours), hagiographies or letters. Let us notice that a "source effect" is 
expected due to the possible overrepresentation of a place (Neustria by Gre- 
gory of Tours or Austrasia by Fredegar) or one individual (in letters or 
hagiographies). The database, that took over 18 months to build, contains 
1331 individuals who held one or several offices in Gaul between the years 
480 and 614, and who we know to have been related or to have met dur- 
ing their lifetime. However, the poverty of the sources only allowed for an 
approximate characterization of the relations between individuals: they are 
either positive, negative, variable or neutral (when the type was unknown). 

We expect the statistical analysis of this network to help us understand 
how the behaviour of an individual can be modeled through their belonging 
to a group. In History, this kind of approach is more common to modernists 
or contemporarists than to medievists who rarely have access to this kind 
of data. 

4.2. Results. The RSM model was run on the network defined by these 
relations, where the subgraphs are the provinces in which the individuals 
lived (Aquitaine, Austrasia, Burgondy, Neustria, Provence or Unknown), 
and the use of the criterion C{q) allowed us to find 6 clusters. Figure 5 
presents the repartition of the different functions in the found clusters. 
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Fig 5. Repartition of the different functions in the found clusters (restricted to the 10 
most frequent functions per cluster). 



In view of these results, we can first analyze the composition of the dif- 
ferent clusters found by the algorithm. First, clusters 1 and 3 appear to be 
made of the people who would attend local assemblies, provincial or diocesan 
councils. The council of Aries which took place in 554, would have had the 
same kind of composition as cluster 3, while that of Auxerre, in 585 could 
well represent cluster 1. Second, clusters 4 and 5 are more characteristic of 
aristocratic assemblies, such as the council of Orange in 529. Third, clusters 
2 and 6 have the same compositions as councils concerned with more po- 
litical issues (those usually convened by a king). Such a council took place 
in Orleans in 511. Let us however notice that cluster 2 is composed of very 
few individuals, which might hurt the relevance of its interpretation. Also, 
we might be able to further our understanding of the composition of these 
clusters by taking into account the similarity of certain functions (such as 
"comte" and "grand"). 

The relations between the different clusters, shown in Figure 6, inform 
us further. Though the limitations expressed above about the roughness of 
the types still apply, they nevertheless provide us with interesting elements 
to confirm the coherence of the proposed model. First, it is natural that we 
should find "neutral" relations at the local level, between clusters 1, 3 and 6. 
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negative variable 




Fig 6. Relations between clusters, for each relation type: negative, variable, neutral and 
positive. 

Indeed, local assemblies were the most documented ones in our sources. On 
the other hand, the links between high level individuals are better known, 
because councils used to settle conflicts between aristocrats, which explains 
the presence of "negative" and "variable" relations. Finally, the positive 
relations between cluster 3, 5 and 6 could represent the personal friendships 
documented by collection of letters between bishops. 

After having described the political background represented by each of the 
clusters, we can compare the organization of the different regions. Figure 7 
presents the cluster repartition in the different provinces. One can observe 
that the clergy and noblemen of the different regions were concerned with 
very different issues: Provence and Burgundy were more concerned with local 
questions (clusters 1 and 3), and less with political ones (clusters 2 and 6). 
The clusters concerned both with local (clusters 1 and 3) and high level (clus- 
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cluster 1 
cluster 2 
cluster 3 
cluster 4 
cluster 5 
cluster 6 




Neustria Provence Unknown Aquitaine Austrasia Burgundy total 



Fig 7. Repartition of the 6 found clusters in the different Provinces. 



ter 6) questions are represented in Aquitaine. Conversely, all levels of power 
are represented in Neustria. This could be the result of a "source effect", 
as mentioned above. Let us also notice that council structures seem similar 
in Austrasia and Aquitaine: sovereigns (kings and queens) are involved in 
Church, and convene councils in order to discuss political questions. 

Some of these observations are confirmed by the estimate of parameter 7 
which is given in a log scale by Figure 8. First, it shows a greater frequency 
of relations between Aquitaine and Neustria, which comes both from a ge- 
ographical and political proximity (Aquitaine is absorbed into Clovis' king- 
dom in 507, then divided and absorbed by Neustria in 511). One can also 
see there another example of "source" effect, as our main source, Gregory of 
Tours, was bishop in Neustria and raised in Aquitaine (next to his uncle, the 
bishop of Clermont), which gave him a good knowledge of both provinces. 
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Neustria Provence Unknown Aquitaine Austrasia Burgundy 

Fig 8. Estimated values for the parameter 7 (in log scale). 

More enlightening, is the relative disconnection of Burgondy and Provence, 
especially in regard to the provinces of Austrasia and Neustria, both heavily 
connected. 

4.3. Conclusion from the historical point of view. A first analysis of the 
results of the RSM model confirms two well known general facts. Indeed, 
our results confirm the preponderance of local assemblies in 6th century 
Gaul and the "source effect". Nevertheless, further analysis of the found 
clusters and their relations yields a better understanding of the period. In 
particular, the composition of the found clusters reflect different archetypes 
of councils, and different levels of political concerns. Our results have also 
highlighted that the type of concerns of each province are closely related to 
the frequency of their communications with others. However, two limitations 
to these results remain. First, we are limited by the scarcity of the historical 
documentation. It would be interesting to see whether the use of more precise 
types of relations (ecclesiastical or secular, through which media, . . . ) could 
improve the results. Second, it would also be interesting for the model to 
take into account temporal evolutions of the relations and clusters. 
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5. Conclusion and further work. In this work, we proposed a new 
stochastic graph model, the random subgraph model, to deal with networks 

where a vertex behavior is influenced by an observed partition variable. 
We derived a variational Bayes EM algorithm to infer the model parame- 
ters from data and applied it to an ecclesiastical network from Merovingian 
Gaul. The results of the fitted RSM enlightened us on the different levels of 
power present at this time in Gaul, and on the different power structures 
of different regions. Let us highlight that the RSM model allows in addition 
the comparison of subgraphs through the model parameters, in particular 
the cluster proportions. One aspect, however, that RSM does not currently 
address is the temporality of the data. Since this aspect can be found in 
many of the data sets we wish to apply the RSM model to, we believe that a 
natural continuation of this work would be a dynamic extension of the RSM 
model. Moreover, we plan to introduce a Chinese restaurant process on the 
latent cluster structure in order to automatically estimate the number of 
clusters while clustering the vertices. Finally, we would like to considera- 
tion the problem of visualizing such networks with typed edges and known 
subgraphs. 

APPENDIX A: VARIATIONAL BAYES 

In this final section, we detail the computations that lead to the update 
rules given in Section 2, and provide an explicit expression of the criterion 

Proposition A.l. The complete data log likelihood the RSM model is 
given by: 

N C K 

logp(X, A, Z, a, 7, n) = ^ ^ {5{Xij = cjZ^kZji log(nfe,,)} 

i^j c=l k,l 
N K 

i=l k=l 
N 

+ i^ij log(7ri,r, ) + (1 - Aij) log(l - 7,,,,,)} 
S S K 

+ Ylogpias) + Ylogp{'yrs) + Yiogp{Uki). 

s=l r,s k,l 
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Proof. 

logp(X, A, Z, a, 7, n) = logp(X | A, Z, n) + logp(A 1 7) + logp(Z | a) + logp(a) + logp(7) 

+ logp(n) 

NCR 

= Y.Y.Y1 ^^(^ij = ^)ZikZjl log(nwe)} 

i^j c=l k,l 
N K 

+ ^^Zik log(a^,,fe) 

1=1 k=l 
N 

+ H {^i3 log(7r,,r^) + (1 - Aij) log(l - 7^,,^.)} 
S S K 

+ ^logp(as) + ^log]9(7„) + ^log]3(nw). 

Proposition A. 2. The VBEM update step for the distribution q{'^rs) is 
given by: 

qilrs) = Beta(7rs; ),y{r,s)e{i,...,sy, 

where 

Urs = + (Aij), 
ri=r,rj=s 

and 

ri=r,rj=s 

Proof. 

log qilrs) = Ez^^ n[logP(X' A, Z, a, 7, n)] + k 

= J2 Mulog(7r-s) + (l- Aj)log(l-7rs)} 

logp(7rs) + «^ 

= ^ {Aij log{jrs) + (1 - Aj) log(l - Jrs)} 

ri=r,rj=s 

(a°, - 1) log(7,,) + (6°, - 1) log(l - 7„) + K 
= («°.-l+ Ai,)log(7,,) 

rf=r,r3=s 

+ (50,-1+ ^ (1 _ A. .)) log(l - 7,,) + 
ri=r,rj=s 
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where /« is a constant term. Hence, the functional form of the variational 
approximation qi'jrs) corresponds to a Beta distribution with updated hy- 
perparameters: 



ri=r,r.i=s 



and 



ri=r,rj 



Proposition A. 3. The VBEM update step for the distribution q{Zi) is 
given by: 

q{Zi)=M{Zf,l,Ti),yi, 

where 



Tik CX exp {^{Xri,k) - V'EXri.oj 

{N C K / C \ 

X] XI '^(^^J = I "^(-fe'c) - "kin) 

j^i c=l 1=1 \ u=l J 



{N C K 
XI X X "^(^i* = ^^^^^ [ ^i-lkc) - V'lX 
j^i c=l 1=1 



C 



u=l 



^Iku) 



Proof. 

log q{Zi) = ^z\\cx,'y,n Pog^^(^' A, Z, a, 7, n)] + 

= E^vi n 



TV C 

EE 

j=l c=l 



K 



zV,n 



N C 

EE 

_i=l c=l 
■ K 

.k=l 



6{Xij = c) ^ ^ife-^jz log(nfcic) 

k,l 
K 

S{Xji = C)Y^ ZjkZii log(nfe;c) 



k,l 
+ K 



= ^ZjfcEa[log(a^,,fc)] 



k=l 
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N C K 

+ E E E ^ik5{Xij = c)¥.z\i,ji[Zji \og{Tlkic)] 

j=l c=l l,k 
N C K 

j=l c=l l,k 

k=l \ 1=1 J 

K ( N C K / C \ 

+ E 1 E E E ^^^ii = ( V'(Swc) - V'CE "*^'") I 

A;=l 1 j^i c=l «=1 \ u=l / 

N C K / C \ 

E E E = c)rji I i^i^ikc) - v-cE I 

ijti c=l i=l \ u=l / 



where k is a constant term. Hence, the functional form of the variational ap- 
proximation q{Zi) corresponds to a multinomial distribution, with updated 
parameters: 

Tik OC exp ^{Xri,k) - V'(E ^ri,l)^ 

{NCR I C \ 

E E E '^(^^j = I V'(Sfeic) - V'CE 
i^i c=l Z=l V u=l / 

' N C K / C \ I 

E E E "^(^j^ = I v-c^ifec) - v-cE f • 

j^j c=l i=l V u=l / I 



+ exp < 

Proposition A. 4. The VBEM update step for the distribution q{ots) is 
given by: 

q{cxs) = Dir(as;Xs),Vs G {1,... ,5}, 

where 

N 

Xsk = X°k + ^^{ri = s)Tik,yk e {1,...,K}. 

1=1 



the random subgraph model 

Proof. 

\ogq{as) = Ez_^\.^^n[logl'(^'A,Z,Q;,7,n)] + K 

N K 
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=1 k=l 



K N 



K 



k=l i=l k=l 
K ( n ^ 



fc=l 



where k is a constant term. Hence, the functional form of the variational 
approximation q{a.s) corresponds to a Dirichlet distribution with updated 
hyperparameters: 



N 



Xsk = X°sk + Y^i^i = s)Tik,yk e {1,...,K}. 



i=l 



Proposition A. 5. The VBEM update step for the distribution q{U.u) 
is given by: 



where 



N 



^klc 



^klc + Y^iXij = c)TikTjU^C G {1, . . . , C}. 



Proof. 

log q{Ilk,i) = IEz^„_^^n\fe' [logP(X' A, Z, a, 7, n)] + k 

N C 

= ^■z,ji\ki [X ^K^ij = c)ZikZji log(nfc/c)] + logpiUki)] + K 



c 



N 



c=l 

c 

= X log(nwc) < 



c 



c=l 



c=l 



N 



^klc-'^ + Yl^i^ij=c)rikrji } + 
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where k is a constant term. Hence, the functional form of the variational 
approximation q(n.ki) corresponds to a Dirichlet distribution with updated 
hyperparameters: 

N 

^klc = '^klc + XI ^^^^^ = c)Tik'rji,yc G {1, . . . , C}. 

Proposition A. 6. When computed right after the M step, the lower 
bound of the marginal likelihood is given by: 

where C{x) = ^^f^l"^^ ifxeR^ and B{a, b) = ?ff)\ v(a, 6) G M^. 
Proof. The lower bound is given by: 

rt \ w n /^'l^i A,Z,Q:,7,n) 

Ciq) = Ez,.,^,n[log( ^(z,o,7,n) 

where 

iog( ''^''^;^;f;"n;"^ ) = E {^^. iog(7...,) + (i - a,) iog(i - 7..,.,)} 

N C K 

+ E E E = ^)^ikZ,l log(nfcie)} 

i^i c=l fe,i 

,p(Z,a,7,n) 



+ log( 



g(Z,a,7,n)^' 



and 



°^^g(Z,a,7,n)^ g(Z) ^ + °^^g(a,7,n)' 
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B{ars, br 



i=l k=l 
S 



-1 c'(xO)nf=i«r"''^ 



^ . ( , I ;k,.7 1 

— )• 



k,i <^(Sw)n^=in;,7 



If G M-^ then C(x) = where r(-) is the gamma function. More- 

over, if (a, 6) G then 5(a, 6) = ^^J^. Finally 



+E 



l>°r,-br,+ Yl (l-'^y) log(l-7r,) 



ri=r,ri=s 



s=l k=l I V i=l / J 



K C 



+ E E 1 -kic - ^kic + E ^(^ij = ^)^ikZji iog(n. 



klc) 



k,l c=l 

TV K 



^EM§i;).E.o.(§i;|.. 



Therefore, that if C{q) is computed right after the M step: 

A,)^E>o.(||^).E.o.(g||;,.E.-(gii)-EE-.-(^^^^ 



26 



Y. JERNITE ET AL. 



REFERENCES 

AiROLDi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed member- 
ship stochastic blockmodcls. The Journal of Machine Learning Research 9 1981 2014. 

Albert, R. and Barabasi, A. L. (2002). Statistical mechanics of complex networks. 
Modem Physics 74 47-97. 

Ambroise, C, Grasseau, G., Hoebeke, M., Latouciie, P., Miele, V. and Picard, F. 
(2010). The mixer R package. http://cran.r-pro'ject.org/web/packages/mixer/. 

Bickel, P. ,J. and Chen, A. (2009). A nonparanictric view of network models and 
Newman-Girvan and other modularities. Proceedings of the National Academy of Sci- 
ences 106 21068-21073. 

BiERNACKi, C., Celeux, G. and Govaert, G. (2003). Choosing starting values for the 
EM algorithm for getting the highest likelihood in multivariate Gaussian mixture mod- 
els. Computational Statistics and Data Analysis 41 561-575. 

BiLMES, J. A. (1998). A gentle tutorial of the EM algorithm and its application to pa- 
rameter estimation for Gaussian mixture and hidden Markov models. International 
Computer Science Institute 4 126. 

Bishop, CM. (2006). Pattern recognition and machine learning. Springer- Ver lag. 

Daudin, J. J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. 
Statistics and Computing 18 173-183. 

Fienberg, S. E. and Wasserman, S. S. (1981). Categorical data analysis of single so- 
ciometric relations. Sociological Methodology 12 156-192. 

Frank, O. and Harary, F. (1982). Cluster inference by using transitivity indices in 
empirical graphs. Journal of the American Statistical Association 835-840. 

GiRVAN, M. and Newman, M. E. J. (2002). Community structure in social and biological 
networks. Proceedings of the National Academy of Sciences 99 7821. 

GoLDENBERG, A., ZiiENG, A. X. and Fienberg, S. E. (2010). A survey of statistical 
network models. Now Publishers. 

Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering 
for social networks. Journal of the Royal Statistical Society: Series A (Statistics in 
Society) 170 301-354. 

HoFMAN, J. M. and WiGGiNS, C. H. (2008). Bayesian approach to network modularity. 

Physical review letters 100 258701. 
Jeffreys, H. (1946). An invariant form for the prior probability in estimations problems. 

In Proceedings of the Royal Society of London. Series A 186 453-461. 
Kemp, C, Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). 

Learning systems of concepts with an infinite relational model. In Proceedings of the 

National Conference on Artificial Intelligence 21 381. 
Latouche, p., Birmele, E. and Ambroise, C. (2009). Advances in Data Analysis Data 

Handling and Business Intelligence Bayesian methods for graph clustering 229-239. 

Springer. 

Latouche, P., Birmele, E. and Ambroise, C. (2011). Overlapping stochastic block 
models with application to the French political blogosphere. Annals of Applied Statistics 
5 309-336. 

Latouciie, P., Birmele, E. and Ambroise, C. (2012). Variational Bayesian inference 
and complexity control for stochastic block models. Statistical Modelling 12 93-115. 

MiLO, R., Shen-Orr, S., Itzkovitz, S., Kashtan, D., Ciiklovskii, D. and Alon, U. 
(2002). Network motifs: simple building blocks of complex networks. Science 298 824- 
827. 

Moreno, J. L. (1934). Who shall survive?: A new approach to the problem of human 



THE RANDOM SUBGRAPH MODEL 



27 



interrelations. Nervous and Mental Disease Publishing Co. 
NOWICKI, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic 

blockstructures. Journal of the American Statistical Association 96 1077-1087. 
Palla, G., Derenyi, I., Farkas, I. and Vicsek, T. (2005). Uncovering the overlapping 

community structure of complex networks in nature and society. Nature 435 814-818. 
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal 

of the American Statistical association 846-850. 
Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic 

blockmodels for graphs with latent block structure. Journal of Classification 14 75-100. 
Soufiani, H. a. and Airoldi, E. M. (2012). Graphlet decomposition of a weighted 

network. Arxiv preprint arXiv: 1203 ,2821. 
Villa, N., Rossi, F. and Truong, Q. D. (2008). Mining a medieval social network by 

kernel SOM and related methods. Arxiv preprint arXiv:0805.1374. 
White, H. C, Boorman, S. A. and Breiger, R. L. (1976). Social structure from 

multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology 

730-780. 

Xing, E. P., Fu, W. and Song, L. (2010). A state-space mixed membership bloekmodel 
for dynamic network tomography. The Annals of Applied Statistics 4 535-566. 



